NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study

https://doi.org/10.1016/S2589-7500(23)00225-X

Zack, Travis; Lehman, Eric; Suzgun, Mirac; Rodriguez, Jorge A; Celi, Leo Anthony; Gichoya, Judy; Jurafsky, Dan; Szolovits, Peter; Bates, David W; Abdulnour, Raja-Elie E; et al (January 2024, The Lancet Digital Health)

Full Text Available
Does BERT Pretrained on Clinical Notes Reveal Sensitive Data?

Lehman, Eric; Jain, Sarthak; Pichotta, Karl; Goldberg, Yoav; Wallace, Byron C (April 2021, North American Chapter of the Association for Computational Linguistics (NAACL))
null (Ed.)
Large Transformers pretrained over clinical notes from Electronic Health Records (EHR) have afforded substantial gains in performance on predictive clinical tasks. The cost of training such models (and the necessity of data access to do so) coupled with their utility motivates parameter sharing, i.e., the release of pretrained models such as ClinicalBERT. While most efforts have used deidentified EHR, many researchers have access to large sets of sensitive, non-deidentified EHR with which they might train a BERT model (or similar). Would it be safe to release the weights of such a model if they did? In this work, we design a battery of approaches intended to recover Personal Health Information (PHI) from a trained BERT. Specifically, we attempt to recover patient names and conditions with which they are associated. We find that simple probing methods are not able to meaningfully extract sensitive information from BERT trained over the MIMIC-III corpus of EHR. However, more sophisticated “attacks” may succeed in doing so: To facilitate such research, we make our experimental setup and baseline probing models available at https://github.com/elehman16/exposing_patient_data_release.
more » « less
Full Text Available
Evidence Inference 2.0: More Data, Better Models

DeYoung, Jay; Lehman, Eric; Nye, Ben; Marshall, Iain J.; Wallace, Byron C. (July 2020, BioNLP: Workshop on Biomedical Natural Language Processing)

Full Text Available
ERASER: A Benchmark to Evaluate Rationalized NLP Models

DeYoung, Jay; Jain, Sarthak; Rajani, Nazneen Fatema; Lehman, Eric; Xiong, Caiming; Socher, Richard; Wallace, Byron C. (January 2020, Transactions of the Association for Computational Linguistics)

Full Text Available
Inferring Which Medical TreatmentsWork from Reports of Clinical Trials

Lehman, Eric; DeYoung, Jay; Barzilay, Regina; Wallace, Byron C. (January 2019, Annual Conference of the North American Chapter of the Association for Computational Linguistics)

How do we know if a particular medical treatment actually works? Ideally one would consult all available evidence from relevant clinical trials. Unfortunately, such results are primarily disseminated in natural language scientific articles, imposing substantial burden on those trying to make sense of them. In this paper, we present a new task and corpus for making this unstructured evidence actionable. The task entails inferring reported findings from a full-text article describing a randomized controlled trial (RCT) with respect to a given intervention, comparator, and outcome of interest, e.g., inferring if an article provides evidence supporting the use of aspirin to reduce risk of stroke, as compared to placebo. We present a new corpus for this task comprising 10,000+ prompts coupled with fulltext articles describing RCTs. Results using a suite of models — ranging from heuristic (rule-based) approaches to attentive neural architectures — demonstrate the difficulty of the task, which we believe largely owes to the lengthy, technical input texts. To facilitate further work on this important, challenging problem we make the corpus, documentation, a website and leaderboard, and code for baselines and evaluation available at http: //evidence-inference.ebm-nlp.com/.
more » « less
Full Text Available

Search for: All records